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Abstract — In this paper we consider distributed allocation 
problems with memory constraint limits. Firstly, we propose 
a tractable relaxation to the problem of optimal symmetric 
allocations from HI. The approximated problem is based on 
the Q-error function, and its solution approaches the solution 
of the initial problem, as the number of storage nodes in the 
network grows. Secondly, exploiting this relaxation, we are able 
to formulate and to solve the problem for storage allocations 
for memory-limited DSS storing and arbitrary memory profiles. 
Finally, we discuss the extension to the case of multiple data 
objects, stored in the DSS. 

I. Introduction 

In last years, more and more attention is given to wireless 
distributed storage systems, or the so called wireless caching 
networks, assumed to deal with the problem of the network 
bandwidth bottleneck in future-generation wireless networks, 
due to the increase of the wireless data traffic related to such 
applications as on-line video streaming, web browsing etc. 
It is worth mentioning that the nowadays wireless networks 
have more and more of available network bandwidth, thanks 
to new communication technologies, and also to the fact that 
the cell size continues to decrease. This implies that one of 
the next problems to be considered in the distributed storage 
context is more related to the limitation on the amount of 
storage memory, available in the system, rather than to network 
parameters of the system. Eor instance, the memory limitation 
can appear in following situations: 1) when the amount of 
data to store is very important (i.e. in order to improve the 
service of on-line video streaming, a large choice of video 
files is proposed to a user); 2) when the data, related to some 
application, is stored over the user devices (i.e. in a Device- 
to-Device communication network), while the device memory, 
reserved by this application, is limited; 3) in the multi-user 
scenario with a large number, the the data is stored with a 
high redundancy, thus improving the quality of experience 
(QoE) perceived by the users, but also leading to large memory 
volumes stored in the network. 

Therefore, in this work we focus on memory-limited dis¬ 
tributed storage systems. We study the problem of storing 
data objects (files) in a set of storage nodes, each of them 
having some maximum memory volume, available for use. For 
simplicity, it is assumed that all storage nodes can be accessed 
successfully with the same probability p. We consider the 


problem of maximizing the probability of success recovery, 
and we aim to characterize the optimal storage allocation, 
given the memory profile of the system and both number and 
sizes of stored files. 

The problem is a generalization of a storage allocation 
problem with unlimited memory, considered in m. However, 
if one directly extends the optimization problem of [I] to 
the memory-limited case, it becomes difficult to handle. The 
reason for that is that the main objective function, on which 
the result from m is based, is in fact a complimentary cdf 
of a binomial distribution B{n,p) with parameters n and p. 
This objective function is discrete and non-monotone, and 
its analysis is already tedious in the original setting of ||T|. 
So, it needs to be handled very carefully in the memory- 
unlimited case, which is even more involved. Therefore, before 
addressing the case with limited memory, we make our first 
contribution by defining a relaxation of the initial optimization 
problem from HI by using a continuous approximation. We 
use the fact that the probability function /B(„,p)(i), i S No, 
of the binomial distribution B{n,p) can be written as 


fl3(n,p) (*) — 
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which is the normal distribution probability function with 
parameters p, = np and a = y^np{l — p), up to corrections 
that vanish as n —c». Based on this result, we propose to 
relax the objective function using a Q-error function 

1 

Qix) = _ _ / X G R'*'. 

The approximation is accurate even for moderate values of 
n (of order several dozens of nodes present in the storage 
network, see Section ED. 

Thanks to the proposed relaxation, we can treat the memory- 
limited storage case. Our second contribution is in defining 
a tractable optimization problem in the case of an arbitrary 
memory profile of the network and in characterizing the 
optimal storage allocation in this case.In particular, we define a 
relaxed optimization problem in the memory-limited case and 
solve it for the case when there is only one data file stored in 
the network (see Section EID- Moreover, a conjecture on the 










case of two stored data objects is developed in Section |IV] 
thus opening the problem of storing multiple data objects in a 
memory-limited distributed storage system. This our third and 
last contribution. 


where the set of corresponding optimal symmetric allocations 
is Xn*. The expression above represents a discrete non¬ 
monotone function in n. In order to find its supremum, it is 
sufficient to restrict to the following subset of values of n: 


II. Optimal Symmetric Allocations Revisited 
A. System Model and State of the Art 

Assume a distributed storage system with N storage nodes. 
A source stores a data object of normalized unit size that is 
encoded and stored in a distributed manner over the system, 
subject to a given total storage budget T {T is the inverse 
of the rate of the underlying code). Let Xi be the amount of 
coded data stored in node i, 1 < i < N. Then, 

N 

Y.x,<T. ( 2 ) 

i=l 

The data collector wishes to download and to recover (i.e. 
to decode) the stored data object. It is assumed that it accesses 
each of N storage nodes independently with some access 
probability p. One aims therefore to find an optimal allocation 
X = {xi^X 2 , ■ ■ ■ ,X]\[), subject to (O, so that the probability 
to recover the data successfully is maximized. Assuming that 
the data was encoded using a MDS code, the data object can 
be recovered if the amount of data collected by the collector is 
above or equal to one unit. This is translated into the following 
optimization problem: 


X*=sup ^ >1 , 

5eP({i.....w}) \je5 / 

(3) 

subject to ©, where 'P{{1,..., N}) is the power set of 
{1,... ,7V} and ![•] is the indicator function. This optimiza¬ 
tion problem can be simplified if the search is restricted to the 
set of symmetric allocations, i.e. if X* is assumed to belong 
to the following subset X: 

N rp 

X = U^^iXn with Xn = {X : xi = T and Xi G {0, —}}. 

i=l 

(4) 


Given that the collector accesses nodes uniformly at ran¬ 
dom, the probability of recovery (objective function) will not 
depend on the exact indexes of non-zero allocations but rather 
on the number of nodes n used to symmetrically store the data 
object. As each of the n nodes used for storage is accessed 
with probability p, the probability of successful recovery for 
a given n value is given by m 


n ^ / \ 

and the optimization problem reads as follows: 


(PI) : n* 


sup 


^ ^ /B(n,p)(*) 


(5) 


( 6 ) 


X-={[T\,[2T\,...,[LT\,N}, (7) 

where L = . This fact is also stated in equation (9) of m. 

Using the fact, the authors of HI obtain the following result 
(Theorems 3 and 4 in H]): 


[TJ, ifpL7^J<l; 

[LT\ or N, if (1 -pjL'TJ -f 2[Tjp(l - < 1. 

( 8 ) 


The interesting regime of parameters for practical applica¬ 
tions is pT > 1 as in this case the success probability is 
unbounded from 1. In the section below, we propose a more 
tractable optimization problem based on a relaxation to (PI) 
which gives a good estimate of solutions regions for n*. 


B. A Q-Function Approximation Applied to (PI) 

Let p = np and a = ^Jnp{\ — p), for sorn^H n G K such 
that n G [0, N] and p G [0,1]. Define the optimization problem 
(P2) as: 

(P2) : n* = supQ [) , l<n<N (9) 
neK \ ^ / 


'Note that here rx is a real value and not an integer as before. However, as 
it corresponds to n above, we keep call it n. To avoid the abuse of notation 
later on, it will always be mentioned if n € M. 



Figure 1. Objective function (probability of successful recovery) of problems 
(PI) and (P2) as a function of n for = 45 and various cases of pT. 
From top to bottom: p = 0.2 and T = 10 (o), p = 0.1 and T = 10 (□) 
and p = 0.05 and T = 10 (•). For each case, with (0) markers we show 
the solution to (PI) with (□) makers the solution to (P2). For pT > 1 and 
pT < 1, the solution to both problems is the same, this is not the case for 
pT = 1. Note that (P2) for the pT = 1 case has multiple solutions. 

















Table I 

Measuring the disparity between (PI) and (P2) 


Size of DSS 

a 


A^ = 10 

0.8823 

0.904 

N = 20 

0.9048 

0.9208 

N = 4h 

0.9345 

0.9532 


(P2) is obtained from (PI) by applying the normal approx¬ 
imation to f] 3 (n,p) described in ([T]i. Thanks to well-known 
properties of the Q{x) function, we can characterize the set 
of solutions as follows; 


Theorem 1. The solution of (P2) is 

'[TJ, pT<l(Casel}- 

G M\N, pT = 1 (Case 2); 

* _ [LT\ , ^ <P< (Case 3); 

^ ~ I C+1. < n < c±l _ I - 1 _ 

’ N — P — n-Vlt ^ nVlt-Vt 

(Case 4); 

, ILT \, p > 

( 10 ) 


The outline of proof of Theorem [T] is given in Appendix 
In Fig. [U we show the objective function of problems (PI) 
and (P2) as a function of n for = 45 and different (p, T) 
pairs. Note that, while in the cases pT > 1 and pT < 1, the 
solution to both problems is the same, this is not the case for 
pT = 1. Recall here that the range of interest for distributive 
storage allocation problems is pT > 1. In Table U includes a 
measure of the disparity between (PI) and (P2) for N = 10, 
N = 20 and N = 45. For the grid p = 0 : 10“^ : 1 and 
T = 0 : 0.1 : N and increasing N values we compute the 
following quantities; 

• Fraction a of points in the grid for which 

l^(Pl) “ ''^(P2)l’ 

• Fraction (3 of points in the subregion of the grid with 

pT > 1 for which = ri*p 2 )\- 

Observe that both a and /3 improve for larger N values, 
which indicates that our approximation gets tight in the limit 
N ^ oo. The equivalence of (P2) and (PI) for large values of 
N, together with the fact that the optimal symmetric allocation 
n* approaches the optimal (asymmetric) one when N goes to 
infinity IT], gives us the asymptotic optimal solution of the 
distributed storage allocation problem. As it is easier to deal 
with (P2), we apply it to our case of interest which is the 
distributed storage allocation in memory-limited systems. 

III. One single data object in Memory-Limited DSS 

Let us consider the DSS of our interest; a storage node 
i is assumed to have an available memory Mi, which can 
be used to store the coded data. Let a data object of total 
budget T be stored in the DSS. Note that if P < min^ Mi, 
then the problem is equivalent to the memory-unlimited case. 


Also, if T > Mi, then the allocation solution does not 
exist. So let us focus on an interesting region of T which is 
mini Mi < T < J^i ^i- With some abuse of notation, let 
the set of memory-limited symmetric allocations of size n be 
defined as; 


Xn = {X : Xi < M„x^ G {0, —}, #(Ti = —) = n}, (11) 

n n 

where #(Ti = a) denotes the number of elements in X, equal 
to a. 


A. Constant Memory Profile 

We start with developing an intermediate result for a DSS 
with a constant memory limit M. Note that the solution to this 
problem will differ from the unconstrained memory scenario 
summarized by ([T]l for those cases where we are interested in 
storing a large amount of memory in an small set of nodes. 
Because of the memory limit M, for pT < 1, the symmetric 
minimum spreading solution might not be optimal. 

We define the set of quasi-symmetric allocations of size n 
and of memory volume M as; 

X^ ={X :x,<M, #(x, =M)=n-l, #(x, = R) = 1}, 

( 12 ) 

with R = T — Mn. Hence, in such allocation we use the 
complete memory M in n — 1 nodes and the rest of the data 
object, i.e., R < M is stored in an additional node, n nodes 
are used in total. By (fTTT i and (fT^ . Xi, i G N, represents a set 
of symmetric allocations where i nodes are used for storing 
and X^ is a set of quasi-symmetric allocations where i nodes 
are sused. 

Define rimin = \^~\, and let Lq be the smallest in¬ 
teger such thaS i^min < lLoT\. Finally, define Mm = 
{lLoT\,, ILT\ ,N}. We have the following result; 

Lemma 1. Assume a limited-memory DSS, for which Mi = 

... = Mjv = M. Let po be the unique solution of the equation 

^min 1 ^min 1 

P ^ ^ /B(rtmin —l,p) (*) “I” (1 ~ P) ^ ^ /B(nmin—l,p) (^) 
LLoTJ 

- X! fB{[LoT].p){i) = 0- (13) 

i=Lo 

Then the set of optimal storage allocations, maximizing the 


^Note that nmin is the minimum number of nodes we can use to store the 
budget T. Since rimin might not be contained in the set J\f in Q, is \_LqT\ 
the possible solution to the problem that is closest to rimin- We assu me th at 
Lq < L. Otherwise the optimal n* is given by Cases 3,4, and 5 of (To) if 
Lq = L, and n* = N if L < Lq < N. 












success recovery in this case, is approximated by 


•iaM 
ri'min ’ 

pT < 1 and p < po (Case la); 

^VLoT\ : 

pT < 1 and p > po (Case lb); 

Xn, nGj\fM\X, 

pT = 1 (Case 2); 

-T [LTJ ) 

y < p < (Case 3); 

Xn, 

L-Vl < 77 < -1- 1 

N n-Vlt ' nVlt-Vt 


(Case 4); 

^YLT\ I 

P ^ n-Vlt + nVlt-Vt ■ 


We formulate now a necessary condition for FLmin alloca¬ 
tion, see Fig. |2ja), to be a better allocation than the symmetric 
allocation: 

Lemma 2. Denote a FLmin allocation by of non-zero 

■' '‘'min 

support Tlniin 06 

N 

j Then, X^^^^ is optimal if 


(14) 

Moreover, X* from ( 1741 ) approaches to the set of optimal 
allocations, when N goes to infinity. 

The outline of the proof of lemma is given in Appendix iBl 
Note that cases 3, 4, 5 of (HI are equivalent to cases 3, 4, 5 of 
(doll. Case 2 is also similar, with the only exception that one 
should consider now A/m instead of J\f. The only difference 
from ([Tol l is therefore in the fact that, when pT < 1 , the 
minimum symmetric allocation is not always optimal anymore 
- there exist values of Lq and of p for which the best allocation 
is the quasi-symmetric one. 

Example 1. Let p = 0.1, T = 1.4, M = 0.5 and N > 3. The 
best allocation in this case is the quasi-symmetric one with 2 
Xi’s equal to 0.5 and one Xi equal to 0.4. 

Remark 1. By using a Taylor expansion of a Q-function, one 
can also get a tight approximation of Pq: 


where 


T 

m > - 

Itmin 


N 

E' 

i— 


M? 




(16) 


(17) 


For m < T/rimin, then the symmetric minimal spreading 
allocation, Fig.^Tfb), has higher recovery probability. Besides, 
all modifications to X^^, defined by putting the remainder of 
the memory in a different set of positions, see Fig. \^a), achieve 
the same recovery probability. Denote this set by 

The proof of Lemma |2] is given in Appendix 0 Lemma |2] 
indicates us that depending on the memory profile, one might 
better to chose either FLmin allocation or symmetric, minimal- 
spreading allocation. 

2) Arbitrary Memory Profile with pT < I." 


Po : 


[l/Ml -Lo 


+ \l/M^ - 


- 1 


[LoT\ 
1 - R 


1 — 1 — \/(nmin — 1)L-^'07’J 


M 


(15) 


Notation 2. Let rimax be the largest integer such that — 
Mjv-rimax- Lraax be the largest integer such 

^max ^ L'^niax-^J • 


> 


B. Arbitrary Memory Profile 

Now consider an arbitrary memory profile M = 
{Ml,..., Mn). W.l.o.g., let Ml < M 2 < ... < Mjs/. For 
this scenario, two possible optimal allocations have to be 
considered for the case pT < 1 and another another two for 
the case pT > 1. We sketch these four scenarios in Fig. |2] 
When pT < 1, the full-load minimum-support asymmetric 
allocation, or FLmin allocation for short, uses the complete 
memory of the nodes with largest available memory. A small 
fraction of residual data can be stored in any of the remaining 
nodes in the network. The FLmin allocation is sketched in 
Fig. I2a). Alternatively, we can store the data object using 
a symmetric minimal spreading allocation, see Fig. Hb). For 
pT > 1, the all-node maximum support allocation, or ANmax 
allocation for short, uses all nodes in the system, defining 
a quasi-symmetric allocation. The ANmax allocation is is 
sketched in Fig. |2jc). Also, a symmetric maximum-spreading 
allocation can be used. Fig. I2d). 

1) Arbitrary Memory Profile with pT < 1: 

Notation 1. Let nmin be the smallest integer such that 
Sr={ 3 ''~^ ^N-i ^ T. As before, let Lq be the smallest integer 
such that Uinin ^ \_LoT\- 


Lemma 3. Denote the ANmax allocation X^^ to be 
where a = {T — Mi)/njna.x- Then X^^ is optimal 

if 

N — TTxnax 

m(A^-nniax)> ^ Mi, (18) 

where m is defined in (ini). If (HI is not verified, then 
the symmetric maximum-spreading allocation. Fig. \^d), has 
higher recovery probability. 

Proof of Lemma [3 is given in Appendix [U] Lemma [3 
shows that depending on the memory profile, one might 
better to chose either AN-max allocation or symmetric, 

maximum-spreading allocation. 

Putting the results of Lemmas |2] and [3 together, we can state 
the following: 

Conjecture 1. Assume a DSS with an arbitrary memory 
profile with Mi < M 2 < ... < M^. Let Am = 

{ \_LqT\ ,..., [T^max^J}. Then 

^We assume that Lmax > otherwise n* will not exist. 
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Figure 2. Possible storage allocations for an arbitrary memory profile: (a) FLmin allocation; (b) symmetric minimal spreading allocation; (c) ANmax 
allocation; (d) symmetric maximum-spreading allocation. 
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X* above approaches to the set of optimal allocations, when 
N goes to infinity. 


IV. Extension; When More Than One Data Object 
Is Stored in the DSS 


In this work we have analyzed memory-limited DSS sys¬ 
tems for a single user. Most of our derivations are based on 
a tractable approximation proposed for the unlimited memory 
case. Given the above results, our future interest is to address 
the multiuser case. We conclude the paper by briefly discussing 
the case of two data objects to store {K = 2). Let the 
objects have total budgets Ti and T 2 respectively. Also, let 
the probability that the data collector downloads the file 1 
be denoted by pi, and that he downloads the file 2 - by 
P 2 = 1 — Pi. We assume the system is memory-limited. Note 
that if < mini Mi, then the problem is equivalent to the 

memory-unlimited case. Also, if T 1 +T 2 > Mi, then the 
allocation solution does not exist. So, the interesting interval 
of T’s is when N min(M) < Ti -|- T 2 < Mi. Let the 
allocations for files 1 and 2 be denoted by Xi and X 2 . If 
we restrict to symmetric allocations with support ni and 77,2 
respectively, then the problem can be approximated by solving 
the following optimization problem. 


(P4) : sup piQ 

(A,A) 



with Pi = Uip; Oi = \/nip(\ -p), for * = 1,2. 


The following can be proven about the solution of (P4): 

• W.I.o.g., let Pi > p 2 . To maximize (P4), we first allocate 
object 1 given the memory profile Mi, M 2 ,..., M^ us¬ 
ing the results presented in Section [Hll Then, we allocate 
object 2 using the residual memory profile. 


• Let Pi = P 2 = P- In this case, the game theory 
suggests that three optimal strategies are possible; a) start 
allocating the first data object and the allocate the second 
one; b) to proceed in the inverse order; c) to allocate data 
of two objects in the mixed way. 


V. Conclusion 

In this paper, we have considered a memory-limited DSS, 
storing one or two data files. The memory profile of the system 
is assumed to be arbitrary; this case is therefore treated in all 
its generality. We wish to emphasize two following points; 

• For a memory-limited DSS, the optimal storage allocation 
is not necessarily a symmetric one, even in for large 
network size N, even for a constant memory profile when 
all nodes have the same amount of memory available for 
storage. This differs from the result obtained in the usual 
memory-unlimited case, where the optimal allocation is 
a symmetric one, in the limit of large N. 

• The obtained result, obtained for the access probability 
p, can be combines with the result from ||2l, developed 
for heterogenous storage networks. Thus it is possible 
to characterize optimal allocation solutions for memory- 
limited, heterogeneous DSS. 

• MDS codes, used to prove our results, are the most 
storage-efficient erasure-correcting codes, but they are not 
efficient complexity-wise, which makes them impractical 
to use. It would be interesting to consider a more practical 
code solution and to check how the optimal allocation 
changes for this case. 


Appendix A 

Outline of the Proof of Theorem[I] 

First, the following lemma is stated (its proof is quite 
straightforward and is omitted for the sake of space); 


Lemma 4. Let L = as previously. Then the solution of 
(I 2 II belongs to the set J\f, given by 0. Next, define 


c{n) = Q 



with p = np, a = sjnp{l — p). 


Owing to Lemma 0] @ can be written as sup„g^ c(n). To 
find a solution of this problem, three possible cases are to 






















consider: pT < 1, pT = 1 and pT > 1. We discuss only 
the latter case, pT > 1. The first two cases can be analyzed 
similarly and their result is stated in stated directly in (fTOl i. 

When pT > 1, c([*Tj) is increasing with i. and 

supjg^]^ c([*rj) = L. Therefore, depending on the value 
of N, n* is either \_LT\ or N. Note that N = \_LT\ is a 
trivial case (n* = N). So let N > \_LT\ and consider 


c(LTTj) 


c{N) = Q 


( ^{l-Tp) \ 

\^Tp{l-p)) 


( L + l-Np\ 

y v'iVp(l -p) J 

(19) 


Two cases are to be distinguished: 

a) Np > L + 1, for which p > > i: The expression 

(fT9] l is positive, if the following is satisfied: 

./F > L + l-Np 

y/Tp(l-p) y/jVp(l-p) 

This holds for 

So, under the condition above, n* = [LT\. Note that, if 
some p satisfies ( l20l i, then it also satisfies p > 

b) i < p < : it can be verified that, for any value of p, 

c([LTj) > c{N) and thus n* = [LT\. 


Appendix B 

Outline of the Proof of Lemma[T| 

To show cases 2, 3, 4, 5 the proof is similar to the one 
of Theorem [T] The only difference now is that the solution 
should belong to A/m instead of Af. However, for case 1, 
the symmetric allocations XiloT] not necessarily the best 
choice. One can show that it only has to be compared with the 
quasi-symmetric allocations occupying the smallest number of 
storage nodes, i.e. with the subvector (Mi,... It is 

easy to see that the probability of success recovery for the 
symmetric and quasi-symmetric minimum spreadings, denoted 
respectively by Ps and Pqs, are given by 

L^oTJ 

Ps = fBi[LoT\,p)i^) 

i=Lo 

^min 1 

Pqs = p 'y ^ /B(rami„-i,p) (0 

^min 1 

^"(1~P) y ^ /B(nmln-l,p) (0 

Moreover, both of them are monotonically increasing in 
p and Ps{l/T) > Pqs{1/T) while Ps(0) < Pqs(O). So, 
there exists a unique parameter p = po such that Ps{po) = 
PQsiPo), and one can find it by solving (fOT l. 


Appendix C 
Proof of Lemma|2] 

We are going to use the Markov inequality, which was 
also used in considering heterogeneous data allocations in El. 
Assuming an arbitrary allocation X with a non-zero support 
n, the optimization problem is approximated as: 

n k 

(P3): X*= sup V&(«,p)P(Vrz > 1), (21) 

Xs.t.(2lholds fcl 

where Yi are i.i.d random variables and Yi ^ px{x), where 
Px {x) is the empirical probability distribution, corresponding 
to the non-zero support of X. The approximation here comes 
from the fact that F^’s are assumed to be i.i.d, i.e. here the 
probability distribution, corresponding to the random choice 
without replacement, is approximated to the probability distri¬ 
bution, corresponding to the random choice with replacement. 
By Markov’s inequality, 

k / \ 

P(^>"^>1)<E =kmx, 

i=i \i=r / 

with mx being the mean of the distribution Px{x). Therefore, 
the objective function in (l2Tl l can be approximated by 

n 

mx E nb{n,p) = mxnp. (22) 

fc=0 

Note that, if A/ is a symmetric allocation, this quantity equals 
to pT, as mx = PP ^ probability of success 

recovery is bounded away from 1. However, if X is the full¬ 
load allocation with the smallest support nmm mx > P, 
the objective function will be bounded by a larger quantity 
than pT. 

Appendix D 
Proof of Lemma[3] 

With the help of the Markov’s inequality as for Lemma |2] 
one obtains that the probability of success recovery is upper 
bounded by p(T — + m{N — nmax)) in the all¬ 

node maximum-spreading case, and by pT in the symmetric, 
maximum spreading case. Hence, the condition (fTsT i follows. 
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